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Abstract 

In this report, we focus on the choice of the kernel for identification 
of a hnear time invariant (LTI) dynamical system from the knowledge of 
the input and a finite set of output observations. We provide guidelines 
to design the kernel function so as to enforce different types of prior in- 
formation about the dynamical system under study. On one hand, we 
characterize general families of kernels that incorporate information such 
as smoothness, stability, relative degree, absence of oscillatory behavior, 
or delay. On the other hand, we show that certain popular kernels for 
curve fitting are not well suited for the identification of stable dynamical 
systems. 
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'sj- . 1 Introduction 

rn 

^— ^ ' Identification of LTI systems is a classical and very well studied problem, see 

e.g. [8]. Although the standard approach is based on parametric models, non- 
parametric kernel-based techniques have been recently shown to achieve re- 
markably better performances under a variety of circumstances [4ll9llll). thus 

^^ ' renewing the attention on the subject. 

jrt , In this report, we consider a formulation based on regularization in reproducing 

kernel Hilbert spaces (RKHS) [1]. The setup is general enough to take into 
account both discrete and continuous-time linear time- invariant systems, allow- 
ing for sparse time sampling of the output signal. For the sake of simplicity, 
we focus on the case of SISO (single input single output) linear time invariant 
system identification, where the goal is to reconstruct a scalar impulse response 
function from the knowledge of the input signal and a finite set of output mea- 
surements. Nevertheless, the ideas presented in this report are general enough 
to be naturally extendable to more complex and structured problems. 
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Much effort has been devoted in the hterature to develop functional spaces that 
model certain desirable properties of signals. Somehow surprisingly, consid- 
erably less effort has been devoted in designing function spaces for modeling 
impulse responses of linear dynamical systems. Arguably, the two modeling 
problems should be handled differently. In this report, we characterize families 
of RKHS that encode those properties that are specific to impulse responses 
of dynamical systems, such as causality, stability, absence of oscillations, rel- 
ative degree, delay, and so on. We build upon a collection of classical results 
about RKHS, and show how they can be applied to enforce different types of 
prior knowledge about a dynamical system by simply designing a suitable kernel 
function. 



2 Regularization for LTI system identification 

In order to handle both continuous and discrete-time system in a unified frame- 
work, we refer to an abstract time set 7", that is a sub-group of (M, +). Given 
an input signal u : 7" — >■ M, a LTI system generates an output signal y : T — ?> R 
according to the convolution equation 

y{t) ^ {h * u){t) = / u{T)h{t-T)dT, 

where /i : 7" — > R is the impulse response^ and the integral is taken with respect 
to the Lebesgue measure. Clearly, if the time set is discrete, the convolution 
integral reduces to a series. 

In the following, we study the problem of identifying the impulse response, 
assuming availability of the input signal and a finite dataset of output measure- 
ment pairs 

2?={(ii,yi),...,(t,,y,)}. 

This is a classical ill-posed deconvolution problem that can be tackled by means 
of regularization techniques [inilll]. Consider, for instance, a regularized least 
squares approach of the type 

where H is a Hilbert space of functions, and A > is a regularization parameter. 
Assume that the input signal and the space % are such that all the point- 
wise evaluated convolutions are bounded linear functionals, namely, for all i — 
!,...,£, there exists a finite constant Ci such that 

\{h*u){t,)\<Cm\H, yheH. 



Then, by the Riesz representation Theorem [61112) , there exist unique represen- 
ters Wi such that 

{h*u){ti) = {h,Wi)-H- 



In addition, one can show that any optimal solution of ([T]) can be expressed 
as a linear combination of the representers (a result known as the representer 
theorem [7]): 



h = y^ CiW^. (2) 



In view of ^, the regularization problem reduces to determining a vector of 
coefficients Ci of the same dimension of the number of observations. More pre- 
cisely, an optimal vector of coefficients c e R^ can be obtained by simply solving 
a linear system of the form 

(K + AI) c = y, 

where y G M^ denotes the vector of output observations, and the entries of the 
kernel matrix K are given by 

2.1 Reproducing Kernel Hilbert Spaces 

Reproducing kernel Hilbert spaces are a family of Hilbert spaces that enjoy par- 
ticularly favorable properties from the point of view of regularization. As their 
name suggests, they are strongly linked with the concept oi positive semidefinite 
kernel. Given a non-empty set X, a positive semidefinite kernel is a symmetric 
function K : X x X ^ R such that 

e e 
^^CiCjK{x„Xj) > 0, y{x„c,) e {x,m.). 

i=l = 1 

A RKHS is a space of functions /i : A" — 5- M such that point-wise evaluation 
functionals are bounded. This means that, for any x ^ X , there exists a finite 
constant Cx such that 

\h{x)\<Cx\\h\\H, yheH. 

Given a RKHS, it can be shown that there exists a unique symmetric and 
positive semidefinite kernel function K (called the reproducing kernel), such 
that the so-called reproducing property holds: 

h{x) ^ {h, Kx)n, yix,h)eX X H, 

where the kernel sections K^ are defined as 

K^{y)^K{x,y), ^yeX. 

The reproducing property states that the representors of point-wise evaluation 
functionals coincide with the kernel sections. Starting from the reproducing 



property, it is also easy to show that the representor of any bounded hnear 
functional L is a function K^ G H such that 

Kl{x) = LK^, Vx e X. 

Therefore, in a RKHS, the representer of any bounded linear functional can be 
obtained explicitly in terms of the reproducing kernel. 

With reference to the deconvolution problem ([T|), we are interested in estimating 
functions defined over the time set X — T. By expressing the representers in 
terms of the kernel, the optimal solution ([2]) can be rewritten in the explicit 
form 



h{t)=^Cr{u*Kt){U). 



Finally, the entries of the kernel matrix K can be expressed as 



K,y = / / u{U-Tl)u{tj -T2)K{Tl,T2)dTldT2. 



3 Basic properties of the impulse response 

By searching the impulse response in a RKHS, we are automatically assuming 
that h is point-wise well-defined and bounded over compact time sets. In this 
section, we show how several other important properties of the impulse response 
can be enforced by simply designing a suitable kernel function. 



3.1 Causality 

A dynamical system is said to be causal if the value of the output signal at a 
certain time instant T does not depend on values of the input in the future (for 
t > T). Causality is a prior knowledge that is virtually always incorporated in 
the model of a dynamical system. This is done already when the signals are 
classified as inputs or outputs of the system: the value of the output signals at 
a certain time is not allowed to depend on the values of the input signals in the 
future, whereas the other way around is possible. 

For a LTI system, causality is equivalent to vanishing of the impulse response 
for negative times, namely 

h{t) =0, Vi < 0, V/i e n. (3) 

The following Lemma characterizes those RKHS that contain causal impulse 
responses, with a simple condition on the kernel function. 



Lemma 1 The RKHS T-i contains only causal impulse responses if and only if 
the reproducing kernel satisfy 

K{tiM)=H{t^)H{t2)K{t^M)- (4) 

where H(t) is the Heaviside function defined as 

1. t >0 



^(^) - 1 0, else 
and K is a kernel defined for non-negative times. 

From the simple result of Lemma [U we can already see that the kernels needed 
for modeling impulse responses of dynamical systems are quite different from 
the typical kernels used for curve fitting. In order to encode a "privileged" 
direction in the time flow, they have to be asymmetric on the real line, and can 
also be discontinuous. 



3.2 Stability 

System stability is an important prior information that should be always incor- 
porated in any identification method, whenever available. Perhaps, the most 
intuitive notion of stability is the so called BIBO (Bounded Input Bounded 
Output) condition that can be expressed as 

||u||oo < +00 => \\y\\oo < +0O, 

where || • ||oo denotes the L°° norm. Assuming BIBO stability entails that 
the output of the system to be identified cannot diverge when excited with a 
bounded input signal. It is well-known that, for a LTI system, BIBO stability 
is equivalent to integr ability of the impulse response: 



h{t)dt 
r 



< +0O. (5) 



Since stability is very often known to be satisfied by the system under study, it is 
interesting to characterize those reproducing kernel Hilbert spaces that contain 
only integrable functions. The following Lemma gives a necessary and sufficient 
condition (see e.g. [3]): 

Lemma 2 The RKHS H is a subspace of L^{T) if and only if 

dt2 < -foo, Vh e L°°{r). 



T 



K{ti,t2)Hti)dti 
T 



We can talk about stability of the kernel, with reference to kernels that satisfy 
the conditions of Lemma [2] It can be easily verified that integrability of the 
kernel is a sufficient condition for "H to be a subspace of L^{T). 

Lemma 3 If K E L^{T^), then % is a subspace of L^{T). 

It is worth observing that the condition of Lemma [3] is also necessary for 
nonnegative-valued kernels (i.e. such that K{ti,t2) > 0, for all ^1,^2), as it 
can be seen by simply setting / = 1 in Lemma [2] 



3.3 Delay 

In view of causality, the value of the output signal at a certain time doesn't 
depend on the values of the input signal in the future. Let D denote the smallest 
time instant such that h{t) ^ 0: 

i:>:=inf{ier:/i(t) 7^0}. 

By causality, D has to be nonnegative. If it is strictly positive, then the system 
is said to exhibit an input-output delay equal to D, meaning that 2/(t) does not 
depend on u{t) for any t > t — D. Once again, the prior knowledge of the delay 
D can be easily incorporated in the kernel function. 

Lemma 4 Every impulse response h E % have a delay equal to D if and only 
if the reproducing kernel is in the form 

KoitxM) ^ K{tx- DM- D), 

with K in the form ^. 

If the value of -D is unknown in advance, it can be treated as an hyper-parameter 
to be estimated from the data. 



4 Kernels for continuous- time systems 

In this section, we focus on some properties of continuous-time systems (T = M) , 
such as smoothness of the impulse response and relative degree, and discuss how 
to enforce them by choosing suitable kernels. 

4.1 Smoothness 

Consider a continuous-time LTI system without delay (the delayed case can be 
simply handled via the change of variable discussed in Lemma [J]) . Impulse re- 
sponses for continuous-time dynamical systems are typically assumed to have 



some degree of smoothness. Smoothness can be expressed in terms of continu- 
ity of h and a certain number of time derivatives, everywhere with the possible 
exception of the origin. Regularity of the impulse response at < = is related 
to the concept of relative degree, which is important enough to deserve an in- 
dependent treatment (see the next subsection). Impulse responses with a high 
number of continuous derivatives corresponds to low-pass dynamical systems 
that attenuates high frequencies of the input signal. It is known, see e.g. |14) . 
that regularity of the kernel propagates to every function in the RKHS. There- 
fore, prior knowledge about smoothness of the impulse response can be directly 
expressed in terms of the kernel function. 

Lemma 5 Let Ti denote a RKHS associated with the kernel in the form 01) 
with T = M.. If K is k-times continuously differentiable on (0,-|-oo)^, then 
the restriction of every function h £ Ti to (0, +oo) is k-times continuously dif- 
ferentiable. In addition, point-wise evaluated derivatives are continuous linear 
functionals, i.e. for all t > and i < k, there exists C < -|-oo such that 

\h^'\t)\ < C\\h\\n, VheH. 



4.2 Relative degree 

The relative degree is an important concept for continuous-time dynamical sys- 
tems. In many cases, prior knowledge about the relative degree is available 
thanks to simple physical considerations. The relative degree of an LTI system 
is directly linked to the regularity of the impulse response at i = (or t = D 
in the delayed case) . In view of Lemma 21 all the left derivatives of the impulse 
response (with the convention /i'"' = h) have to vanish: 

/iW(0-) = 0, Vfc>0. 

On the other hand, the right derivatives may well be different from zero. As- 
suming existence of all the necessary derivatives, the relative degree of a LTI 
system is a natural number k such that 

/iW(0+) = 0, V^<fc, /i('=)(0+)^0. 

If ft,'^'^(0+) = for all i, the relative degree is undefined. 

If the relative degree is fc, then the fc-th derivative of the impulse response 
in t = is discontinuous. Let's represent the impulse response in the form 
h{t) = II{t)h^{t), where H is the Heaviside step function, and assume that 
h^{t) admits at least k right derivatives at t = 0. By using distributional 



derivatives and properties of the convolution, we have 

= h^''^ (0+) (<5o * u) (t) + / u(r)/i^*+'^ (t - T)dT 



h'-''^0+)u{t) + / u{T)h^:l: + ^\t - T)dT. 



The (k + l)-th time derivative of the output is the first derivative that is directly 
influenced by the input u{t). Therefore, the system exhibits an input-output 
integral effect equivalent to a chain of k integrators on the input of a system 
with relative degree one. 

Prior knowledge about the relative degree of the system can be enforced by 
designing the kernel according with the following Lemma. 

Lemma 6 Under the assumptions of Lemma O every impulse response h Cz Ti 
has relative degree greater or equal than k if and only if 

\Jt e M, lim Kf\T) = 0, Vi < k. (6) 

T-J-0 + 

Hence, when the impulse response is searched within an RKHS, the relative 
degree of the identified system is directly related to the simple property ([6]) of 
the kernel function. We can therefore introduce the concept of relative degree 
of the kernel. 



4.3 Examples 

The simplest possible kernel for impulse response identification is perhaps the 
Heaviside kernel: 

K{tiM) = H[ti)H(t2), 

whose associated RKHS contains only step functions. This kernel has relative 
degree equal to one and is clearly not stable. As a second example, consider the 
exponential kernel 

K{t,,t,)=H(t,)H{t2)e-^(''+'-\ (7) 

This kernel is infinitely differentiable everywhere, except over the lines ii = 
and i2 = 0, where it is discontinuous. Since K is discontinuous, the relative 
degree is equal to one. The associated Hilbert space H contains exponentially 
decreasing functions. 



5 Families of kernels for system identification 

The exponential kernel defined in ([7]) satisfies the sufficient condition of Lemma 
[21 therefore the associated RKHS contains stable impulse responses of relative 
degree one (in fact, the space contains only stable exponential functions). Now, 
assume that a kernel Ki with relative degree one is available. Then, we can 
easily generate a family of kernels of arbitrary relative degree via the following 
recursive procedure: 

/ K^{Tl,T2)dTldT2, i > 1. 

-oo -' — oo 

Unfortunately, the application of such procedure doesn't preserve integrability 
of the original kernel Ki. Consider for example the exponential kernel ([7|). 
Although Ki is stable, all the other kernels Ki with i > 2 do not satisfy the 
necessary condition of Lemma[2](to see this, it is sufficient to choose h = \) and 
are therefore not BIBO stable. In the following, we describe some alternative 
ways of constructing families of stable kernels. 



5.1 A family of stable kernels 

In this subsection, we discuss a technique to construct stable kernels of any 
relative degree. The key idea to obtain stability is to introduce a change of 
coordinates that maps R"*" into the finite interval [0,1], and then use a kernel 
over the unit square [0, 1]^. Let G : [0, 1]^ -^ R denote a positive semidefinite 
kernel, and ht^ : R^ — > [0,1] denote the exponential coordinate transformation 

Then, we can construct a class of kernels defined as in (j4|), where 

K{hM)^{ht2f [ G{K{h),K{t2))d^i{Lo), fceN, (8) 

and /i is a probability measure. If G{h^{ti),ht^{t2)) is a kernel with relative 
degree one, we can immediately check, using Lemma [6l that the kernel ([8]) has 
relative degree fc + 1. To ensure BIBO stability, the mass of /i(w) should not be 
concentrated around zero and the kernel G must vanish sufficiently fast around 
the origin. The following Lemma gives a sufficient condition. 

Lemma 7 Let G : [0, 1]^ — !• R denote a kernel such that 

G{si,S2 



SlS2 



<C, V(si,S2)e[0,l]^. (9) 



// the support of /i does not contains the origin, then the kernel ^ is BIBO 
stable for all k Cz N. 



An example is the recently proposed stable spline kernel [5] , obtained by choos- 
ing G as the cubic spline kernel (that can be also seen as the covariance function 
of an integrated Wiener process on ]R+): 

siS2min{si,S2} min{si,S2}^ 
G{s,,s,) = . 

A simple calculation shows that condition ([9]) is satisfied with C = 1/3. By 
using (ISl) , we can start from this kernel to generate a class of stable kernels of 
arbitrary relative degree. For example, by choosing fi as the unit mass on a 
certain frequency oj > 0, we obtain the kernel 

K{h,t2) ^ {ht2)''G{hUti),hUt2)) 

' — w(ti+t2+max{ti,t2}) — 3tjmax{ti,t2} 



{ht2r 



2 6 

which is stable and has relative degree equal to fc + 1. 

5.2 Kernels for relaxation systems 

Many real-world systems, such as reciprocal electrical networks whose energy 
storage elements are of the same type, or mechanical systems in which iner- 
tial effects may be neglected have the property that the impulse response never 
exhibits oscillations. Relaxation systems, see e.g. J18j . are dynamical systems 
whose impulse response is a so-called completely monotone function. An in- 
finitely differentiable function / : M+ — ^ R is called completely monotone if 

(_l)«/(n)(i) >0, VneN, i>0. 

The following characterization of on completely monotone functions [2j[T7] is a 
convenient tool that allows to generalize the simple exponential kernel defined 
indZl). 

Theorem 1 (Bernstein- Widder) An infinitely differentiable real-valued func- 
tion f defined on the real line is completely monotone if and only if there exists 
a non-negative finite Borel measure ji on M^ such that 



fit) = f e-*-dM(^). 



In view of this last theorem, completely monotone functions are characterized 
as mixture of decreasing exponentials or, in other words, as Laplace transforms 
of non-negative measures. Let / denote a completely monotone function, and 
consider the family of functions of the form 

Kiti,t2) = H{ti)H{t2)fitl + t2). (10) 

10 



By Theorem [TJ we can easily verify that ([TU)) defines a positive semidefinitc 
kernel: 

i e. 

i i „ 



Clearly, not every function in the associated RKHS is a completely monotone 
impulse response. However, all the kernel sections Kt are completely monotone. 

Now, observe that, unless f — 0, the relative degree of kernel ^TU\\ is always one. 
Indeed, if the relative degree is greater than one, then we have 

f{t) = Kt{0+)=0, VteM. 

By using Lemma [3l we can check that, when the support of /i does not contain 
the origin, the kernel ((TU]) is BIBO stable: 

K{ti,t2)\dtidt2 = 



+ 
-^-V^ < -^ / d^i{u;) < +CX). 

On the other hand, if the support of /x contains the origin, we may obtain 
unstable kernels. For instance, when /x is the unitary mass centered in the 
origin, we obtain the Heaviside kernel H(ti)H{t2), which is not stable. Finally, 
observe that not all the kernels of the form pUj) that vanishes when ti or t2 
tend to +CJO are stable, as shown by the following counterexample: 



i + iti + t2y 



This kernel is indeed of the type (flUl) . since the function {1 + P) ^ is completely 
monotone. However, the necessary condition of Lemma [5] is not satisfied with 
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h = l: 



dtx 



dU 



[ ( J - tan-i (is)) rfi2 
i2 (I - tan-1 (i2)) 



+00 




_t2 

+ l+<2 



2rf^2 



l + -log(l + t^) 



+ OC 





5.3 Translation invariant kernels are not stable 



In contrast with ([T0|. consider now kernels the type 

K{t,,t2) ^ H(t,)H{t2)f(h -t2). 



(11) 



The following classical result [13^ characterizes the class of functions such that 
f{ti — 12) is a positive semidefinite kernel. 

Theorem 2 (Schoenberg) Let / : M ^ R denote a continuous function. 
Then, f{ti — ^2) "is a positive semidefinite kernel if and only if there exists a 
non-negative finite Borel measure fi on M+ such that 



fit) 



C08{tuj)dfi{uj). 



Hence, when / is the cosine transform of a non-negative measure, the functions 
of the form (jlip are positive semidefinite kernels, since they are the product 
of the Heaviside kernel and a positive semidefinite kernel. The family includes 
oscillating functions of the type /(ii — ^2) = ^idiCOs{uJi{ti — ^2)) that are 
apparently instable, but also widely used kernels like the Gaussian 

In view of their popularity, one might be tempted to adopt these kernels for 
system identification. However, a simple calculation shows that, unless / = 0, 
kernels of the type (fTT]) are never stable: 



R+ 



> 



f{\h~t2\)dti 
f{\r\)dT 
fi\r\)dT 
fi\r\)dT 



dt2 = 
dt2 
dt2 

dt2 — +00. 
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Since BIBO stability is almost always satisfied in real world systems, the class 
of kernels (TTlT) seems to be not well suited for system identification. 



6 Conclusions 

Regularization techniques in RKHS are flexible tools for both discrete and con- 
tinuous time LTI system identification. They allow to handle datasets with 
sparse temporal sampling while incorporating several types of prior informa- 
tions. As shown in this report, informations such as BIBO stability, relative 
degree, and smoothness can be encoded by designing simple properties of the 
kernel function. We have also discussed several examples of kernels, showing 
that some of them are well-suited to describe stable dynamics while others are 
not. 



Appendix 

Proof of Lemma [1] By the reproducing property, we have 

hit) = {Kt,h)n, 

If the kernel K satisfies the condition of the Lemma, we have Kt — ioi t < 0, 
so that h{t) equals zero for negative t. On the other hand, since Kf ^ H for all 
t, condition ([3]) implies 

Kt (t) = K{t, r) = 0, Vt < 0, Vt eT. 

In view of symmetry, it follows that the kernel must necessarily be in the form 
defined by the Lemma. 



n 



Proof of Lemma [2j 

This is an immediate corollary of Proposition 4.2. in [3]. 



Proof of Lemma [3} 

If K is integrable, for all h G L°°{T), we have 



n 



T 



K{ti,t2)h{ti)dti 
T 



< ||/i||oo / / \K{hM)\dtidt 



dt2 

' 2 < +00. 
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n 

Proof of Lemma [4) The proof is similar to that of Lemma [1] By the repro- 
ducing property, we have 

h{t)^{Kt,h)n, 

If the kernel K satisfies the condition of the Lemma, we have Kt = ior t < D , 
so that h{t) equals zero t < D. On the other hand, since Kt ^ T-L for all t, 
condition (jH]) implies 

Ktir) = K{t, t) =0, Vi < D, Vr e T. 

In view of symmetry, it follows that the kernel must necessarily be in the form 
defined by the Lemma. 

n 

Proof of Lemma [5j 

The restriction of the kernel to (0, +00) is A:-times continuously differentiable. 
Then, by Corollary 4.36 of [13], it follows that the restriction of every function 
/i G H to the interval (0, +00) is A;-times continuously differentiable, and point- 
wise evaluated derivatives are bounded linear functionals. 

n 

Proof of Lemma [6} 

In view of Lemma [Sj point- wise evaluated derivatives at any i > are bounded 
linear functionals. By the reproducing property, we have 

lim /iW(t) = lim (K^\h)H- 

If all the impulse responses h £ H have relative degree greater or equal than fc, 
the left hand side is zero for all h £ H and i < k. It follows that 

lim Rf^ =0, Vi < k. 

r-i.O+ 

Condition ^ follows from the symmetry of the kernel. Conversely, if condition 
© holds, we immediately obtain 

lim /i«(r) = 0, Vi<fc, yheH, 

since the inner product is continuos. It follows that the relative degree of any 
function h of the space is greater or equal than k. 

n 
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Proof of Lemma [7) We have 

\K{ti,t2)\dtidt2 = 

(tit2f\G{K{ti),K{t2))\dhdt2d^i{<^) 



B+ JR+xR+ 

dii{u)) 



/ (lnsilns2) \G{si,S2)\dsids2 

J[0,l]2 JRH 



^2(1 + *:) 

1 /■ I, l(7f.si . ,so 



1 /■ n 1 Afe 1^(31,52)1 

TTTT / S1S2 (insilns2J dsids2 

< ntT,u-\ / S1S2 (Insilns2)''rfsi(is2 



'[0,1]^ 

<-5^e--(-'-'=)<+oo. 



The thesis follows by applying Lemma [3] 



n 
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